Goto

Collaborating Authors

 walsh-hadamard variational inference


Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Neural Information Processing Systems

Over-parameterized models, such as DeepNets and ConvNets, form a class of models that are routinely adopted in a wide variety of applications, and for which Bayesian inference is desirable but extremely challenging. Variational inference offers the tools to tackle this challenge in a scalable way and with some degree of flexibility on the approximation, but for overparameterized models this is challenging due to the over-regularization property of the variational objective. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference (WHVI), which uses Walsh-Hadamardbased factorization strategies to reduce the parameterization and accelerate computations, thus avoiding over-regularization issues with the variational objective. Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference for Deep Learning.


Review for NeurIPS paper: Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Neural Information Processing Systems

Weaknesses: The main weakness I see with this paper is its empirical evaluation, which could be more convincing. While the experiments on CNNs show that WHVI is competitive with other approaches on VGG16 while being more parameter efficient (which is impressive), I am not sure how well this is aligned with the goal of the paper. I was under the impression that the goal of the paper was to improve Bayesian inference in deep neural networks (for which I would expect stronger results), but instead the goal might be to reduce the number of model parameters without sacrificing accuracy -- it would be great if the authors could clarify this. Furthermore, I would have liked to see a more extensive evaluation of uncertainty calibration, both in in-domain and especially out-of-domain settings, using e.g. the benchmarks proposed in Ovadia et al. 2019, which would further strenghten the paper. Also, the paper does not compare against state-of-the-art methods for deep uncertainty quantification such as deep ensembles (Lakshminarayanan et al. 2017, Ovadia et al. 2019), which makes it hard to assess the potential impact of the proposed approach.


Review for NeurIPS paper: Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Neural Information Processing Systems

Walsh-Hadamard factorizations for variational posteriors are proposed. While reviewers appreciated the paper, the discussion brought to light several concerns shared across the reviewers. R1 in particular has updated their review to reflect some of these points from the discussion. It seems the proposed approaches are only applicable to a fully-connected last layer. There was a sense in the discussion that the authors had dodged these questions rather than addressing them directly. Last layer methods are certainly useful, and widely used in practice, such that this (significant) constraint would certainly be acceptable, if directly and honestly presented, alongside comparisons to such methods, such as the references [1,2] provided by R1.


Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Neural Information Processing Systems

Over-parameterized models, such as DeepNets and ConvNets, form a class of models that are routinely adopted in a wide variety of applications, and for which Bayesian inference is desirable but extremely challenging. Variational inference offers the tools to tackle this challenge in a scalable way and with some degree of flexibility on the approximation, but for overparameterized models this is challenging due to the over-regularization property of the variational objective. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference (WHVI), which uses Walsh-Hadamardbased factorization strategies to reduce the parameterization and accelerate computations, thus avoiding over-regularization issues with the variational objective. Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference for Deep Learning.


Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Rossi, Simone, Marmin, Sebastien, Filippone, Maurizio

arXiv.org Machine Learning

Over-parameterized models, such as DeepNets and ConvNets, form a class of models that are routinely adopted in a wide variety of applications, and for which Bayesian inference is desirable but extremely challenging. Variational inference offers the tools to tackle this challenge in a scalable way and with some degree of flexibility on the approximation, but for over-parameterized models this is challenging due to the over-regularization property of the variational objective. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference (WHVI), which uses Walsh-Hadamard-based factorization strategies to reduce the parameterization and accelerate computations, thus avoiding over-regularization issues with the variational objective. Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference.